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Abstract. We present a lattice algorithm specifically designed for some classical applications of lattice 
reduction. The applications are for lattice bases with a generalized knapsack-type structure, where the 

■ target vectors are boundably short. For such applications, the complexity of the algorithm improves 
' traditional lattice reduction by replacing some dependence on the bit-length of the input vectors by 

■ some dependence on the bound for the output vectors. If the bit-length of the target vectors is unrelated 
to the bit-length of the input, then our algorithm is only linear in the bit-length of the input entries, 
which is an improvement over the quadratic complexity floating-point LLL algorithms. To illustrate 
the usefulness of this algorithm we show that a direct application to factoring univariate polynomials 

' over the integers leads to the first complexity bound improvement since 1984. A second application is 

, algebraic number reconstruction, where a new complexity bound is obtained as well. 

O . 

1 Introduction 

Lattice reduction algorithms are essential tools in computational number theory and cryptography. 
A lattice is a discrete subset of M. n that is also a Z-module. The goal of lattice reduction is to find 

j , a 'nice' basis for a lattice, one which is near orthogonal and composed of short vectors. Since the 

publication of the 1982 Lenstra, Lenstra, Lovasz [15] lattice reduction algorithm many applications 
£N) ■ have been discovered, such as polynomial factorization [15, 11] and attacking several important 

public- key cryptosystems including knapsack cryptosystems [23], RSA under certain settings [7], 
and DSA and some signature schemes in particular settings [12]. One of the important features 
of the LLL algorithm was that it could approximate the shortest vector of a lattice in polynomial 
time. This is valuable because finding the exact shortest vector in a lattice is provably NP-hard [1, 
18]. Given a basis bi , . . . , G M n which satisfies || bj ||< X Vi, the LLL algorithm has a running 
time of 0{d 5 n log 3 X) using classical arithmetic. Recently there has been a resurgence of lattice 
reduction work thanks to Nguyen and Stehle's L 2 algorithm [20, 21] which performs lattice reduction 
in O (d A n log X[d + log A]) CPU operations. The primary result of L 2 was that the dependence on 
log A is only quadratic allowing for improvement on applications using large input vectors. 
The main result: Many applications of LLL (see the applications section below) involve finding 
a vector in a lattice whose norm is known to be small in advance. In such cases it can be more 
efficient to reduce a basis of a sub-lattice which contains all targeted vectors than reducing a basis 
of the entire lattice. In this paper we target short vectors in specific types of input lattice bases 
which we call knapsack- type bases. The new algorithm introduces a search parameter B which the 
user provides. This parameter is used to bound the norms of targeted short vectors. To be precise: 

The rows of the following matrices represent a knapsack-type basis 
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The specifications of our algorithm are as follows. It takes as input a knapsack-type basis bi, . . . , £ 
Z n of a lattice L with || bj ||< X \/i and a search parameter B; it returns a reduced basis generating 
a sub-lattice L'CL such that if v £ L and || v ||< B then v € L' . 
Our algorithm has the following complexity bounds for various input: 



No ^ 


0(d 2 (n + d 2 )(d + log B) [log X + n(d + log £)]) 


No restriction on Pi 


0{d 4 (d + log 5) [log X + d(d + log B)]) 


Many Pi large w.r.t. B 


0(dr 3 (r + log B)[\ogX + d(r + log 5)]) 



These complexity bounds have several distinct parameters, so a comparison with other algorithms 
is a bit subtle. The most significant parameter to explore is B, the search parameter. If one selects 
B = X then our algorithm will return a reduced basis of L' = L in 0(d 2 n(n + d 2 )[d 2 + log 2 X]). 
This is an interesting result because our algorithm, like the original LLL and the L 2 algorithms, 
uses switches and size-reductions of the vectors to arrive at a reduced basis. The fact that we return 
a reduced basis with a complexity so similar to L 2 implies that there are alternative orderings on 
the switches which lead to similar performance. 

When using a smaller value of B than X the algorithm will return either: 

— A reduced basis of a sub-lattice L' which contains all vectors of norm < B. This sub-lattice 
may be different than the sub-lattice, L" , generated by all vectors of norm < B, and we do 
have L" C V C L. Also, because the basis of L' is reduced, we have an approximation of the 
shortest non-zero vector of L. 

— The empty set, in which case the algorithm has proved that no non-zero vector of norm < B 
exists in L. 

We offer the following complexity comparison with L 2 [20] for some values of B on square input 
lattices (with Pj's). When a column has a non-zero Pj we can reduce the Xij modulo Pj. Thus, 
without loss of generality, we may assume that Pj is the largest element in its column. Note that 
r = d-N. 
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0(d 6 logX + d 5 log 2 X) 


B = O(X) 


0(d 7 + d 5 log 2 X) 


B = 0[X x l d ) 


0(d 2 r 5 + r 3 log 2 X) 


B = 2°W 


0(d 4 r 3 + d 2 r 3 logX) 



It should be noted that [20] explores running times of L 2 on knapsack lattices with N = 1 (such 
lattice bases are used in [9]). In this case, L 2 will have complexity 0(d 5 logX + d 4 log 2 X). 
Our approach: We reduce the basis gradually, using many separate calls to another lattice re- 
duction algorithm. To get the above complexity results we chose H-LLL [19] but there are many 
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suitable lattice reduction algorithms we could use instead such as [13, 15, 20, 24]. For more details 
on why we made this decision see the discussion in section 5. 

There are three important features to our approach. First, we approach the problem column by 
column. Beginning with the r x r identity and with each iteration of the algorithm we expand our 
scope to include one more column of the Xij. Next, within each column iteration, we reduce the 
new entries bit by bit, starting with a reduction using only the most significant bits, then gradual 
including more and more bits of data. Third, we allow for the removal of vectors which have become 
too large. This allows us to always work on small entries, but restricts us to a sub-lattice. 

The proof of the algorithm's complexity is essentially a study of two quantities, the product 
of the Gram-Schmidt lengths of the current vectors which we call the active determinant and an 
energy function which we call progress. We amortize all of the lattice reduction costs using progress, 
and we bound the number of iterations and number of vectors using the active determinant. Neither 
of these quantities is impacted by the choice of lattice reduction algorithm. 

Applications of the algorithm: As evidence for the usefulness of this new approach we show 
two new complexity results based on applications of the main algorithm. The first result is a new 
complexity for the classical problem of factoring polynomials in 7L\x\. If the polynomial has degree 
N, coefficients smaller than log(A), and when reduced modulo a prime p has r irreducible factors 
then we prove a complexity of 0(N 3 r 4 + iV 2 r 4 log A) for the lattice reduction costs using classical 
arithmetic. One must also add the cost of multi-factor Hensel lifting which is 0(N 6 + iV 4 log 2 A) 
ignoring the small terms log(r) and log 2 p (see [8] for details). This is the first improvement over 
the Schonhage bound given in 1984 [25] of 0(N S + N 5 log 3 A). 

The second new complexity result comes in the problem of reconstructing a minimal polynomial 
from a complex approximation of the algebraic number. In this application we know 0(d 2 + d log H) 
bits of an approximation of some complex root of an unknown polynomial h(x) with degree d and 
with maximal coefficient of absolute value < H. Then our algorithm can be used to find the 
coefficients of h{x) in 0{d 7 + d 5 log 2 H) CPU operations. 

Other problems of common interest which might be impacted by our algorithm include integer 
relation finding (where N = 1) and simultaneous Diophantine approximation of several real numbers 
[10, 6] (where r = 1). 

Notations: All costs are given for the bit-complexity model. A standard row vector will be denoted 
v, v[i] represents the i th entry of v, v[i, . . . , j] a vector consisting of all entries of v from the i th 
entry to the j th entry, and v[— 1] the final entry of v. Also we will use ||w||oo as the max- norm or 
the largest absolute value of an entry in the vector w, || w ||:= \/^(w[i]) 2 which we call the norm 
of w, and w T as the transpose of w. The scalar product will be denoted v • w := ^ v[z] • w[z]. For 
a matrix M we will use M[l, . . . , k] to denote the first k columns of M. The n by n identity matrix 
will be denoted I n xn- F° r a real number x we use \x] and [x\ to denote the closest integer > x 
and < x respectively. 

Road map: In section 2 we give a brief introduction to lattice reduction algorithms. In section 3 we 
present the central algorithm of the paper and prove its correctness. In section 4 we prove several 
important features by studying quasi-invariants we call the active determinant and progress. In 
this section we treat lattice reduction as a black-box algorithm. In section 5 we prove the overall 
complexity and other important claims about the new algorithm by fixing a choice for a standard 
lattice reduction algorithm. In section 6 we offer new complexity results for factoring polynomials 
in Z[x] and algebraic number reconstruction. 
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2 Background on lattice reduction 



The purpose of this section is to present some facts from [15] that will be needed throughout the 
paper. For a more general treatment of lattice reduction see [17]. 

A lattice, L, is a discrete subset of M. n that is also a Z-module. Let bi, . . . , G L be a basis 
of L and denote b*, . . . ,b^ 6 R n as the Gram-Schmidt orthogonalization over R of bi, . . . , b^. 

Let 5 e (1/4,1] and rj G [1/2, y/S). Let Zj = log 1/5 || b* || 2 , and denote = prpr. Note that 
bi,b*,li, fiij will change throughout the algorithm sketched below. 

• • . . 2 1||||2 

Definition 1. bi, . . . , hd is LLL-reduced if \\ h* \\ < — "II D i+i II f or 1 < i < d and < rj 

for 1 < j < i < d. 

In the original paper the values for (6,-n) were chosen as (3/4, 1/2) so that would simply 
be 2. 

Algorithm 1 (Rough sketch of LLL-type algorithms) 

Input: A basis bi, . . . , b^ of a lattice L. 
Output: An LLL-reduced basis of L. 

A - k := 2 

B - while k < d do: 

1 - (Gram-Schmidt over Z). By subtracting suitable Z-linear combinations of hi, . . . , b K _i from 

h K make sure that |/Xj, K | < rj for i < k. 

2 - (LLL Switch). If interchanging b K _i and h K will decrease l K -i by at least 1 then do so. 

3 - (Repeat). // not switched n := k + 1, if switched k = max(n — 1,2). 

That the above algorithm terminates, and that the output is LLL-reduced was shown in [15]. Step 
BI has no effect on the Zj. In step B2 the only l, L that change are l K -i and l K . The following lemmas 
present some standard facts which we will need. 

Lemma 1. An LLL switch can not increase max(Zi, . . . , Id), nor can it decrease min(Zi, . . . , Id). 

Lemma 2. // || h d \\ > B then any vector in L with norm < B is a Z-linear combination of 
hi,... ,h d -i- 

In other words, if the current basis of the lattice is bi, . . . , b^ and if the last vector has sufficiently 
large G-S length then, provided the user is only interested in elements of L with norm < B, the 
last basis element can be removed. 

Lemma 2 follows from the proof of [15, Eq. (1.11)], and is true regardless of whether bi, . . . , b^ 
is LLL-reduced or not. However, if one chooses an arbitrary basis bi,...,b<f of some lattice L, 
then it is unlikely that the last vector has large G-S length (after all, || b^* || is the norm of b^ 
reduced modulo bi, . . . , b^_i over R). The effect of LLL reduction is to move G-S length towards 
later vectors. 
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3 Main algorithm 



In this section we present the central algorithm of the paper and a proof of its correctness. Our 
algorithm is a kind of wrapper for other standard lattice reduction algorithms. We try to present it 
as independently as possible of the choice of lattice reduction algorithm. In order to be general we 
must first outline the features that we require of the chosen lattice reduction algorithm. Our first 
requirement is that the output satisfy the following slightly weakened version of LLL-reduction. 

Definition 2. Let L C R n be a lattice and bi,...,b s G L be R-linearly independent. We call 
bi, . . . , h s an a-reduced basis of L if 1,2, and 3a hold, and an (a, B) -reduced sequence (basis of a 
sub-lattice) if 1,2, and 3b hold: 

1- || b* || < a || b* +1 || for % = 1 ... s — 1. 

2. || b* ||<|| bi ||< d 1 - 1 || b* || for i = 1 ... s. 

3. (a) L = Zb 1 + --- + Zh s . 

(b) || b* ||< B and for every v£i with || v ||< B we have v G Zbi + • • • + Zh s . 

The original LLL algorithm from [15] returns output with a = \/2, L 2 from [20] with a = 
s ^ i for appropriate choices of (S,rj), and H-LLL from [19] reduced with a = ) s T ?' f or 

appropriate (5, rj, 9). We may now also make a useful observation about an (a, -B)-reduced sequence. 

Lemma 3. // the vectors bi, . . . , b s form an (a, B) -reduced sequence and we let bj, ... , b* repre- 
sent the GSO, then the following properties are true: 

- || b* || < a s -' l B for alii. 

- || bj ||< u s ~ l B for all i. 

We use the concept of a-reduction as a means of making proofs which are largely independent 
of which lattice reduction algorithm a user might choose. For a basis which is a-reduced, a small 
value of a implies a strong reduction. In our algorithm we use the variable a as the worst-case 
guarantee of reduction quality. We make our proofs (specifically Lemma 8 and Theorem 3) assuming 
an a > y/4/3. This value is chosen because [15, 20, 19] cannot guarantee a stronger reduction. 
An (a, -B)-reduced bases is typically made from an a-reduced basis by removing trailing vectors 
with large G-S length. The introduction of (a, -B)-reduction does not require creating new lattice 
reduction algorithms, just the minor adjustment of detecting and removing vectors above a given 
G-S length. 

Algorithm 2 LLL_with_removals 
Input: bi, . . . , b s G R n and B G K. 
Output: b'i, . . . , h' s i G M n (a, B)-reduced, s' < s. 

Procedure: Use any lattice reduction procedure which returns an a-reduced basis and follows 
Assumption 1. However, when it is discovered that the final vector has G-S length provably > B 
remove that final vector (deal with it no further). 

Assumption 1 The lattice reduction algorithm chosen for LLL_with-removals must use switches of 
consecutive vectors during its reduction process. These switches must have the following properties: 
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1. There exists a number 7 > 1 such that every switch of vectors bj and bj+i increases || b* +1 || 
by a factor provably > 7. 

2. The quantity max{ \\ h* \\.\\ b* +1 ||} cannot be increased by switching bj and bj+i. 

3. No steps other than switches can affect G-S norms \\ h\ ||, . . . , || b* ||. 

Assumption 1 is not very strong as [15, 20, 19, 24, 27] and the sketch in Algorithm 1 all conform 
to these assumptions. We do not allow for the extreme case where 7 = 1, although running times 
have been studied in [2, 16]. It should also be noted that in the floating point lattice reduction 
algorithms || b* || is only known approximately. In this case one must only remove vectors whose 
approximate G-S length is sufficiently large to ensure that the exact G-S length is > B. 

The format of the input matrices was given in section 1. A search parameter B is given to 
bound the norm of the target vectors. The algorithm performs its best when B is small compared 
to the bit-length of the entries in the input matrix, although B need not be small for the algorithm 
to work. 

Definition 3. We say the Pj are large enough if: 



Note that if N = 0{r) then the Pj are trivially large enough. However, for applications where N 
is potentially much larger than r this becomes a non-trivial condition. In this case having B close 
to X means that the P^s are not large enough. 

In the following algorithm we will gradually reduce the input basis. This will be done one column 
at a time, similar to the experiments in [3, 6]. The current basis vectors are denoted bj and we will 
use M to represent the matrix whose rows are the bj. We will use the notation Xj to represent the 
column vector (xi j, . . . , x r j) T . 

The matrix M will begin as I r xr, and we will adjoin xi and a new row (0,Pi) if appropriate. 
Each time we add a column Xj we will need to calculate the effects of prior lattice reductions on 
the new Xj. We use to represent a new column of entries which will be adjoined to M. In fact 
yj = M[l, . . . ,r] ■ Xj. Before adjoining the entries we also scale them by a power of 2, to have 
smaller absolute values. This keeps the entries in M at a uniform absolute value. The central loop 
of the algorithm is the process of gradually using more and more bits of yj until every entry in 
M is again an integer. No rounding is performed: we use rational arithmetic on the last column of 
each row. Throughout the algorithm the number of rows of M will be changing. We let s be the 
current number of rows of M. If (1) is satisfied for some k = 0(r) then we can actually bound s 
by 2r + 2k + 1. We use c as an apriori upper bound on s, either c := 2r + 2k + 1 or c := r + N. The 
algorithm has better performance when c is small. We let L represent the lattice generated by the 
rows of A. 

Algorithm 3 GraduaLLLL 

Input: A search parameter, B > y/b G Q, an integer knapsack-type matrix, A, and an a > 



Output: An (a, B) -reduced basis bi,...,b s of a sub-lattice V in L with the property that if 
v G L and || v || < B then v G V . 

The Main Algorithm: 



Pj\ > 2a 4r+4k+2 B 2 for all but k = 0(r) values of j. 



(1) 
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1 - if (1) holds set c := min(2r + 2k + 1, r + N) 

2 - s := r;M := I rxr 

3 - for j = 1 . . . N do: 
a- y j :=M[l,...,r]-x j ;e:= [log 2 (max{|Pj|, ||yj||oo, 2})J 

b - M : 



' 


Pj/*' 


M 





; if Pj / then s := s + 1 else remove zero row 

while (£ '^0) do: 
i - yj := 2?- M- [(),••• ,0, 1] T ; £ := max{0, [log 2 (gfel} 
ii- M:= [M[l,...,r + j-l]|y,/2^] 

iii - Call LLL_with_removals on M and set M to output; adjust s 
4 - return M 

First we will prove the correctness of the algorithm. We need to show that the Gram-Schmidt 
lengths are never decreased by scaling the final entry or adding a new entry 

Lemma 4. Let bi, . . . , b s € W 1 be the basis of a lattice and bj, . . . , b* its GSO. Let a : R n ->■ R n 
scale up t/ie Zasi entry 6y some factor (3 > 1, then we have \\ h* \\<\\ <r(bj)* ||. In other words, 
scaling the final entry of each vector by the same scalar (3 > 1 cannot decrease ||b*|| for any i. 

Lemma 5. Let bi, . . . ,b s G M n and Zei b*, . . . ,b* el" be their GSO. The act of adjoining an 
(n + l) st entry to each vector and re-evaluating the GSO cannot decrease \\ b* || for any i (assuming 
that the new entry is in M.). 

The proofs of these lemmas are quite similar and can be found in the appendix. Now we are 
ready to prove the first theorem, asserting the correctness of algorithm 3's output. 

Theorem 1. Algorithm 3 correctly returns an a-reduced basis of a sub-lattice, V , in L such that 
if v 6 L and \\ v ||< B then v G Ll . 

Proof. When the algorithm terminates all entries are unsealed and each vector in the output is 
inside of L as it is a linear combination of the original input vectors. Thus the output is a basis of a 
sub-lattice V inside L. Further, the algorithm terminates after a final call to step 3(c) iii so returns 
an (a, £>)-reduced sequence. 

Now we show that if v G L and || v ||< B then v G L'. The removed vectors correspond to 
vectors hi £ L that, by lemmas 4 and 5, have G-S length at least as large as those of bj. The claim 
then follows from lemmas 1 and 2. 



4 Two invariants of the algorithm 

Here we present the important proofs about the set-up of our algorithm. All proofs in this section 
and the next allow for a black-box lattice reduction algorithm up to satisfying assumption 1. Each 
proof in this section involves the study of an invariant. The two invariants which we use are: 

— The Active Determinant, AD(M), which is the product of the G-S lengths of the active vectors. 
This remains constant under standard lattice reduction algorithms, and allows us to bound 
many features of the proofs. 

— The Progress, PF = Yli=i(^ ~ 1) 1°S II D i II 2 +n rm r log(4a 4c B 4 ) , where n rm is the total number 
of vectors which have been removed so far. This function is an energy function which never 
decreases, and is increased by > 1 for each switch made in the lattice reduction algorithm. 
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A study of the active determinant 

Definition 4. We call the active determinant of the vectors h\, . . . , h s the product of their Gram- 
Schmidt lengths. For notation we use, AD or AD({bi}) := Y\t=i II D i II- F° r a "matrix M with the 
i th row denoted by M[i], we use AD or AD(M) = AD({M[1],.. . ,M[s]}). 

For an (a, -B)-reduced sequence we can nicely bound the AD. We have such a sequence after 
each execution of step 3(c)ih. 

Lemma 6. Ifh±,...,h s are an (a, B)-reduced sequence then AD < {a s ~ l B 2 ) sl . 

We now want to attack two problems, bounding the norm of each vector just before lattice 
reduction, and bounding the number of vectors throughout the algorithm. 

Lemma 7. If s < c then just before step 3(c)iii we have || bj || 2 < 2a 4c B 4 for i = 1 . . . s. 

The full details of this proof can be found in the appendix. The following theorem holds trivially 
when there is no condition on the Pj or if N = 0. When N > r and B is at least a bit smaller 
than X we can show that not all of the extra vectors stay in the lattice. In other words, if there 
is enough of a difference between B and X then the sub-lattice aspect of the algorithm begins to 
allow for some slight additional savings. Here the primary result of this theorem is allowing 0{r) 
vectors with a relatively weak condition on the Pj. 

Theorem 2. Throughout the algorithm we have s < c. 

Proof. If c = r + N then s < c is vacuously true. So assume c = 2(r + k) + 1 and all but k = 0(r) 
of the Pj satisfy \Pj\ > 2a 4r+4k+2 B 2 . When the algorithm be gins, AD = 1 and s = r. For s to 
increase step 3 must finish without removing a vector. If this happens during iteration j then the 
AD has increased by a factor \Pj\- The LLL-switches inside of step 3(c)iii do not alter the AD by 
Assumption 1. Each vector which is removed during step 3(c)iii has G-S length < 2a 4r+4k+2 B 2 by 
Lemmas 7 and 1. After iteration j we have n rm = r + j — s as the total number of removed vectors. 
All but k of the Pi have larger norm than any removed vector. Therefore the smallest AD can 
be after iteration j is > (2 a ( 4r + 4fc + 2 ) B 2 ) J " k_nrm . Rearranging we get AD > {2a 4r+4k+2 B 2 ) s ~ r ~ k . 
This contradicts Lemma 6 when s reaches 2r + 2k for the first time because (2a 4r+4k+2 B 2 ) r+k > 
(a 2r+2fc - 1 B 2 ) r+fc . 

Corollary 1. Throughout the algorithm we have \\ b* ||< 2a 2c B 2 . 

We also use the active determinant to bound the number of iterations of the main loop, i.e. 
step 3c. First we show in the appendix that AD is increased by every scaling which does not end 
the main loop. 

Lemma 8. Every execution of step 3(c)ii either increases the AD by a factor > ^ or sets £ = 0. 

Now we are ready to prove that the number of iterations of the main loop is 0(r + N). This 
is important because it means that, although we look at all of the information in the lattice, the 
number of times we have to call lattice reduction is unrelated to log A. 

Theorem 3. The number of iterations of step 3c is 0(r + N). 

The strategy of this proof is to show that each succesful scaling increases the active determinant 
and to bound the number of iterations using Lemma 6 and Corollary 1. For space constraints this 
proof is provided in the appendix. 
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A study of the progress function We will now amortize the costs of lattice reduction over each 
of the 0(r + N) calls to step 3(c)iii. We do this by counting switches, using Progress PF (defined 
below). In order to mimic the proof from [15] for our algorithm we introduce a type of Energy 
function which we can use over many calls to LLL (not only a single call) . 

Definition 5. Let bi, . . . , b s be the current basis at any point in our algorithm, let b*, . . . , b* be 
their GSO, and k := log 7 || b* || 2 for all i = 1 ... s. We let n rm be the number of vectors which have 
been removed so far in the algorithm. Then we define the progress function PF to be: 

PF := • h + • • • + (a - 1) ■ l s + n™ ■ c • log 7 (4a 4c 5 4 ). 

This function is designed to effectively bound the largest number of switches which can have 
occurred so far. To prove that it serves this purpose we must prove the following lemma: 

Lemma 9. After step 2 Progress PF has value 0. No step in our algorithm can cause the progress 
PF to decrease. Further, every switch which takes place in step 3(c)iii must increase PF by at least 
1. 

Theorem 4. Throughout our algorithm the total number of switches used by all calls to step 3(c)iii 
is 0((r + N)c(c + log 5)) with Pj and 0(c 2 (c + log B)) with no Pj. 

Proof. Since Lemma 9 shows us that PF never decreases and every switch increases PF by at 
least 1, then the number of switches is bounded by PF. However PF is bounded by Lemma 7 
which bounds k < log 7 (a 4r B 4 ), Theorem 2 which bounds s < c, and the fact that we cannot 
remove more vectors than are given which implies n rm < r + N. Further we can see that (s — l)l s < 
(c — l)log 7 (Aa Ac B 4 ) so PF is maximized by making n rm = (r + N) (or c if no vectors added) 
and s = 0. In which case we have number of switches < PF < (r + N)(c — l)(log 7 (4a 4c l? 4 ) = 
0((r + N)c(c + log-B)). Also if there are no Pj, we can replace r + N by c. 

5 Complexity bound of main algorithm 

In this section we wish to prove a bound for the overall bit-complexity of algorithm 3. The com- 
plexity bound must rely on the complexity bound of the lattice reduction algorithm we choose for 
step 3(c)iii. The results in the previous sections have not relied on this choice. We will present our 
complexity bound using the H-LLL algorithm from [19]. We choose H-LLL for this result because of 
its favorable complexity bound and because the analysis of our necessary adaptations is relatively 
simple. See [19] for more details on H-LLL. 

We make some minor adjustments to the H-LLL algorithm and its analysis. The changes to the 
algorithm are the following: 

— We have a single non- integer entry in each vector of bit-length 0(c + log X). 

- Whenever the final vector has G-S length sufficiently larger than B, it is removed. This has no 
impact on the complexity analysis. 

We use r as the number of switches used in a single call to H-LLL. This allows the analysis of 
progress PF to be applied directly. The following theorem is an adaptation of the main theorem 
in [19] adapted to reflect our adjustments. 
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Theorem 5. If a single call to step 3(c)iii, with H-LLL [19] as the chosen variation of LLL, 
uses t switches then the CPU cost is bounded by 0((t + c + log B)c 2 [(r + N)(c + log 5) + logX]) 
bit- operations. 

Now we are ready to complete the complexity analysis of the our algorithm. 

Theorem 6. The cost of executing algorithm 3 with H-LLL [19] as the variant of LLL in step 3(c)iii 
is 

C((r + N)c 3 {c + log 5) [log X + (r + N)(c + log B)}) 

CPU operations, where B is a search parameter chosen by the user, \A[i,j]\ < X for all and 
c = r + N or c = 0(r) (see definition 3 for details). If there are no Pj 's then the cost is 

<D((r + N + c 2 )(c + log ^c 2 [log X + (r + N)(c + log B)}). 

Proof. Steps 2, 3b, 3(c)i, and 3(c)ii have negligible costs in comparison to the rest of the algorithm. 
Step 3a is called N times, each call performs s inner products. While each inner product performs 
r multiplications each of the form bj[m] • x m j appealing to Corollary 1 we bound the cost of each 
multiplication by 0((c + log B) log X). Since Theorem 2 gives s < c we know that the total cost of 
all calls to step 3a is 0(Ncr(c + log B) log X). Let k = 0(r + N) be the number of iterations of the 
main loop. Let Tj be the number of LLL switches used in the i th iteration. Theorem 5 gives the cost 
of the i th call to step 3(c)iii as = 0((rj + c + logi?)c 2 [(r + AQ(c+logI?) + logX]). Theorem 4 implies 

that tH h r fc = 0((r + N)c(c + log B)) (or C(c 2 (c + log B)) when there are no Pj's). The total 

cost of all calls to step 3(c)iii is then 0([k(c + log B) + n H h T k ]c 2 [(r + N)(c + log B) + log X}). 

The term [k(c + logB) + t\ + • • • + Tfc] can be replaced by 0((r + N)c(c + log B)) (if no Pj then 
0((r + N + c 2 )(c + log 5))). The complete cost of is now 0(Nrc(c + log 5) logX + (r + N)c 3 [c + 
log i?) (log X + (r+iV)(c+ log £>)]). The first term is absorbed by the cost of the second term, proving 
the theorem. If there are no Pj then we get C((r + iV + c 2 )(c + log.B)c 2 [logX + (r + N)(c + log B)]). 

6 New complexities for applications of main algorithm 

Our algorithm has been designed for some applications of lattice reduction. In this section we 
justify the importance of this algorithm by directly applying it to two classical applications of 
lattice reduction. 

New complexity bound for factoring in Z[x] In [4] it is shown that the problem of factoring 
a polynomial, / G Z[x], can be accomplished by the reduction of a large knapsack-type lattice. In 
this subsection we merely apply our algorithm to the lattice suggested in [4]. 

Reminders from [4]. Let / G Z[x] be a polynomial of degree N. Let A be a bound on the 
absolute value of the coefficients of /. Let p be a prime such that / = Iffi ■ ■ ■ f r mod p a a separable 
irreducible factorization of / in the p-adics lifted to precision a, the fi are monic, and If is the 
leading coefficient of /. For our purposes we choose B := y/r + 1. 

We will make some minor changes to the All-Coefficients matrix defined in [4] to produce a 
matrix that looks like: 
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pCt—b\ 

1 £1,1 ••• ari,jv 

\ 1 ^r,l • • • £r,JV / 

Here Xij is the j th coefficient of /• • f /fi mods p a divided by p bj and p bj represents y/~N times 
a bound on the j th coefficient of g' ■ f/g for any true factor g G Z[a;] of /. In this way the target 
vectors will be quite small. An empty spot in this matrix represents a zero entry. This matrix 
has p a - b i > 2 N2+Nl °^ > 2a Ar+2 B 2 for all j. An (a, 5)-reduction of this matrix will solve the 
recombination problem by a similar argument to the one presented in [4] and refined in [22]. Now 
we look at the computational complexity of making and reducing this matrix which gives the new 
result for factoring inside Z[x]. 

Theorem 7. Using algorithm 3 on the All-Coefficients matrix above provides a complete irreducible 
factorization of a polynomial f of degree N, coefficients of bit-length < log A, and r irreducible 
factors when reduced modulo a prime p in 

0{N 2 r A [N + log A}) 

CPU operations. The cost of creating the All- Coefficients matrix adds 0(N 4 [N 2 + log 2 A]) CPU 
operations using classical arithmetic (suppressing small factors logr and log 2 p) to the complexity 
bound. 

The following chart gives a complexity bound comparison of our algorithm with the factorization 
algorithm presented by Schonhage in [25] we estimate both bounds using classical arithmetic and 
fast FFT-based arithmetic [5]. We also suppress all log AT, logr, log/?, and log log A terms. 



Classical GraduaLLLL 


0{N 3 r A + N 2 r A log A + N 6 + N 4 log 2 A) 


Classical Schonhage 


0(N 8 + N 5 log 3 A) 


Fast GraduaLLLL 


0(N 3 r 3 + N 2 r 3 log A) 


Fast Schonhage 


0(N 6 + N 4 log 2 A) 



The Schonhage algorithm is not widely implemented because of its impracticality. For most 
polynomials, r is much smaller than N. Our main algorithm will reduce the All-Coefficients matrix 
with a competitive practical running time, but constructing the matrix itself will require more 
Hensel lifting than seems necessary in practice. In [22] a similar switch-complexity bound to section 4 
is given on a more practical factoring algorithm. 

Algebraic number reconstruction The problem of finding a minimal polynomial from an 
approximation of a complex root was attacked in [14] using lattice reduction techniques using 
knapsack-type bases. For an extensive treatment see [17]. 

Theorem 8. Suppose we know 0{d 2 + d\og H) bits of precision of a complex root a of an unknown 
irreducible polynomial, h(x), where the degree ofh is d and its maximal coefficient has absolute value 
< H. Algorithm 3 can be used to find h(x) in 0{d 7 + d 5 log 2 H) CPU operations. 
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This new complexity is an improvement over the L 2 algorithm which would use 0(d?+d 7 log 2 H) 
CPU operations to reduce the same lattice. Although, one can prove a better switch-complexity 
with a two-column knapsack matrix by using [10, Lem. 2] to bound the determinant of the lattice 
as 0(X 2 ) and thus the potential function from [15] is 0{X 2d ), leading to a switch complexity of 
O(dlogX) (posed as an open question in [26, sec. 5.3]). Using this argument the complexity for L 2 
is reduced to 0(d 8 + d 6 log 2 H). 
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